Black-white differences in chronic stress exposures to predict preterm birth: interpretable, race/ethnicity-specific machine learning model

Background Differential exposure to chronic stressors by race/ethnicity may help explain Black-White inequalities in rates of preterm birth. However, researchers have not investigated the cumulative, interactive, and population-specific nature of chronic stressor exposures and their possible nonlinear associations with preterm birth. Models capable of computing such high-dimensional associations that could differ by race/ethnicity are needed. We developed machine learning models of chronic stressors to both predict preterm birth more accurately and identify chronic stressors and other risk factors driving preterm birth risk among non-Hispanic Black and non-Hispanic White pregnant women. Methods Multivariate Adaptive Regression Splines (MARS) models were developed for preterm birth prediction for non-Hispanic Black, non-Hispanic White, and combined study samples derived from the CDC’s Pregnancy Risk Assessment Monitoring System data (2012–2017). For each sample population, MARS models were trained and tested using 5-fold cross-validation. For each population, the Area Under the ROC Curve (AUC) was used to evaluate model performance, and variable importance for preterm birth prediction was computed. Results Among 81,892 non-Hispanic Black and 277,963 non-Hispanic White live births (weighted sample), the best-performing MARS models showed high accuracy (AUC: 0.754–0.765) and similar-or-better performance for race/ethnicity-specific models compared to the combined model. The number of prenatal care visits, premature rupture of membrane, and medical conditions were more important than other variables in predicting preterm birth across the populations. Chronic stressors (e.g., low maternal education and intimate partner violence) and their correlates predicted preterm birth only for non-Hispanic Black women. Conclusions Our study findings reinforce that such mid or upstream determinants of health as chronic stressors should be targeted to reduce excess preterm birth risk among non-Hispanic Black women and ultimately narrow the persistent Black-White gap in preterm birth in the U.S. Supplementary Information The online version contains supplementary material available at 10.1186/s12884-024-06613-w.


Background
Preterm birth (< 37 weeks' gestation) has a range of adverse effects on child health, academic, and social outcomes [1], as well as on parents and families (e.g., psychological distress) [2], and generates high educational and medical costs [3].In 2021, 384,384 infants were born preterm in the U.S [4].The country's preterm birth rate rose 4-10.49% in 2021, the highest level reported since at least 2007 [4].Importantly, the Black-White inequalities in preterm birth have persisted over the years, such that non-Hispanic Black women (14.75%) are approximately 1.5 times more likely than non-Hispanic White women (9.5%) to experience preterm birth [4].Nevertheless, the underlying causes of this Black-White difference are not fully understood.Although well-established maternal risk factors explain only about half of the PTB risk [5], growing evidence attributes the remaining unexplained risk to chronic stress [1,[6][7][8][9][10].
Study findings are mixed regarding the effects of chronic stress exposures (i.e., chronic stressors) on preterm birth [11,12].One contributor to this inconsistency is limitations in study design and modeling to capture the complexities around women's chronic stressors, which have been conceptualized variably across studies and out of racial/ethnic context [13].Evidence shows racial/ ethnic variations in three common chronic stressors of childbearing-aged women-namely, financial hardship, perceived isolation, and direct/indirect experience of physical violence (e.g., intimate partner violence [IPV]) [14].These findings suggest that previous conceptualizations of chronic stressors are likely compromised by assuming universal chronic stress experiences across groups defined by race/ethnicity.Moreover, the existing statistical models that included stress as variables assumed linear and independent associations between stressors and between stressors and outcomes.However, such models are less effective in capturing the dynamics of chronic stressors that are synergistic, accumulate over time, and vary in types and effects by race/ethnicity.
To address these evidence gaps, the present study employed a more flexible and sophisticated methodnamely, machine learning-for more accurate computation of chronic stressors and subsequent prediction of preterm birth among non-Hispanic Black and non-Hispanic White women in the U.S. Machine learning gives computers the capability to learn without explicit instructions but based on patterns and inference in data [15].Machine learning is known for its robustness in handling high-dimensional data with many variables combined in non-linear fashions to predict outcomes or detect new patterns in data [16].
In recent decades, an increasing number of studies have used machine learning to predict preterm birth, in which they employed a wide spectrum of machine learning models, from linear regression to deep neural networks [17], and many different data types, including ultrasound imaging, diagnostic screening, fetal monitoring, and genetics [18].Most of the prior studies used electronic health records data collected from local hospitals, and their variables in the models encompassed a combination of pregnant women's health conditions, procedures performed at the hospitals, prescriptions, or tests (e.g., bloodwork and ultrasound) [17,19,20].Only a handful used large population data (e.g., national survey, administrative, or birth and death certificate data) that contained variables rich in socioeconomic, psychological, or behavioral factors beyond biomedical factors and were more representative of the population of pregnant women in the U.S [21][22][23][24].
Furthermore, prior studies using machine learning focused more on improving model performance than on understanding the implications of those predictions, making the developed machine learning models opaque, not intuitive, or challenging for users to understand.Systems whose decisions cannot be well-interpreted are less likely to be trusted, particularly in healthcare [25].Hence, there is a critical need to develop interpretable machine learning models that are trustworthy and high-performing in preterm birth prediction, whose outcomes can be reliably used for early identification and intervention with expecting mothers to prevent preterm birth.
This study aimed to (a) develop machine learning models of chronic stress exposures to predict preterm birth risk among non-Hispanic Black women, non-Hispanic White women, and racial/ethnic groups combined; (b) evaluate the models' prediction accuracy; and (c) identify and compare important features for preterm birth prediction among the three models.To the best of our knowledge, this study is the first to use interpretable machine learning models to investigate how various chronic stressors-along with sociodemographic, medical, and behavioral factors-predict preterm birth among non-Hispanic Black and non-Hispanic White pregnant women in the U.S. in the context of a national, population-based dataset.

Data source
This secondary data analysis used data from the Pregnancy Risk Assessment Monitoring System (PRAMS) linked with birth certificate data collected by the Centers for Disease Control and Prevention (CDC).This study used Phase 7 (2012Phase 7 ( -2015) ) and Phase 8 (2016-2017) data, the latest two Phases, for our study findings to reflect as most recent trends as possible.PRAMS is an ongoing, population-based surveillance project established by the CDC to monitor maternal attitudes and experiences (e.g., perceived racial discrimination, stressful live events [SLEs]) before, during, and shortly after pregnancy (CDC, 2022 [26]).Every month, each state participating in the PRAMS selects a sample of newly delivered mothers from live birth certificates by stratified random sampling without replacement (1,300 to 3,400 women each year) to receive a mailed-out questionnaire.The PRAMS questionnaire consists of two parts: core and standard/ state-developed questions.The core questionnaire is asked by all participating states, while the standard/statedeveloped questionnaire is chosen from a pre-tested list of standard questions developed by the CDC or states on their own.As a result, each state's PRAMS questionnaire is unique, even though most items are shared across states.Questionnaires are mailed between 2 and 4 months after delivery and followed with a telephone interview for non-responders.The final PRAMS dataset is weighted for sample design, nonresponse, and noncoverage to allow the construction of population estimates representative of all women who gave birth in each state during survey years.The CDC PRAMS working group sets a response rate threshold of 55-70% depending on survey years to minimize nonresponse bias [27].

Inclusion criteria
The analytic sample consisted of first-time mothers who: (a) were aged younger than 50 years at the time of childbirth; (b) delivered live singleton births without birth defects; and (c) identified themselves as non-Hispanic Black or non-Hispanic White.We limited our sample to first-time mothers due to their higher risk of preterm birth than multiparous women [28].Only birth mothers (not adoptive mothers) were subject to analysis to link birth mothers' chronic stress exposures to preterm birth.Although the CDC defined 16-49 years as childbearing age, our sample included mothers aged younger than 16 because the maternal age variable in the data was categorical, not allowing us to limit the sample to those aged 16-49 years.Also, only singleton births without birth defects were included because the causes and consequences of adverse birth outcomes in the case of multiple births and birth defects differ from those of singleton births without birth defects.
Originally, the dataset included 222,290 individuals.We excluded those who had a prior live birth (n = 32,630), gave birth to twins+ (n = 3,504) or an infant with a birth defect (n = 3,196), and were not of non-Hispanic Black or non-Hispanic White race/ethnicity (n = 54,975).This sequential elimination process reduced the initial sample size to 127,985.The final sample size was 78,356 after deleting missing data.The highest number of missing data was 127,985 observed in two variables (i.e., bleeding during pregnancy and pregnancy complications) (Table S1).With the survey weight considered, the final sample represented 359,855 women, with 81,892 non-Hispanic Black women and 277,963 non-Hispanic White women.

Measures
46 out of 669 variables were selected and modeled.Figure 1 is a flow diagram for variable selection.Although the study's focus was on chronic stressors as predictors for preterm birth, our models included other relevant factors that could mediate or confound the associations between chronic stressors and preterm birth based on the prior literature [7,11,29,30], such as one's sociodemographic, medical, and behavioral characteristics.Survey years and U.S. states were modeled to factor in potential temporal and spatial variations.However, Fig. 1 Flow diagram for variable selection several variables planned to be analyzed were removed from the analysis due to a substantial amount of missing data (social support, home visitor to help prepare for the new baby, perceived racial discrimination, and perceived neighborhood safety).A comprehensive description of the analyzed variables is provided in Table S2 in the supplemental materials.

Chronic stressors
As external stressors, we analyzed health insurance coverage before and during pregnancy (yes/no), yearly total household income (with 12 levels), maternal educational attainment (0-8, 9-11, 12, 13-15, or 16 + years), receiving WIC (yes/no) as an indication of lower income, physical abuse by a husband/partner before and during pregnancy (yes/no), and 11 items regarding SLEs (yes/ no).Some examples of SLEs were separation or divorce, homelessness, arguing with a husband/partner more than usual, and unwanted pregnancy by a husband/partner.As enhancers of stress, we analyzed psychological distress, such as depression before pregnancy (yes/no).

Medical factors
These included the number of pregnancy terminations in the past (as a continuous variable), the presence of pre-pregnancy health conditions (e.g., diabetes mellitus and chronic hypertension), pre-pregnancy body mass index (BMI) (as a continuous variable), gestational diabetes, pregnancy complications (e.g., fever and ruptured membrane), and other medical risk factors.Most of these items were answered yes or no.

Behavioral factors
These encompassed multivitamin intake (didn't take vitamin, 1-3 times/week, 4-6 times/week, or every day/ week), pregnancy intention (later, sooner, then, did not want then or any time, or was not sure), the number of prenatal care visits (PNC; <= 8, 9-11, or 12+), initiation of the PNC in the first trimester (yes, no, or no PNC), and the number of cigarettes smoked (before pregnancy, during 1st, 2nd, and 3rd trimester).

Data analysis Variable selection and handling of missing data
Of the final list of 76 variables associated with chronic stressors and preterm birth, we removed 30 variables during data pre-processing (e.g., merging variables, discarding variables with over 10% missing data [31], and discarding variables neither a predictor nor an outcome) (Fig. 1).However, as an exception, we included variables with a 10-12.2%missing rate to keep the annual household income variable (12.2% missing) in the analysis, as income was an essential indicator of socioeconomic status and a well-known source of chronic stress.By doing so, we automatically included such variables as a cut in work hours or pay of husband/partner/self (11.6% missing) and homeless (11.4% missing).Ultimately, we analyzed 46 variables and 78,356 individuals (359,855 individuals after the application of sampling weight) who met the inclusion criteria and did not have missing data.

Descriptive statistics
We investigated the participants' characteristics and their associations with preterm birth, stratified by race/ethnicity.The characteristics were summarized with frequency, percentage, mean, and standard deviation.Racial/ethnic differences in the characteristics and their associations with preterm birth were examined using Chi-squared tests with Rao & Scott's second-order correction (for categorical variables) and Wilcoxon rank-sum tests (for continuous variables) for complex survey samples.The statistical significance was set at the alpha level of 5%.

Multivariate adaptive regression splines (MARS)
We used a MARS model to predict preterm birth among three groups: non-Hispanic Black women, non-Hispanic White women, and both.MARS is a nonparametric, multivariate regression method that can estimate complex non-linear relations by a series of spline (i.e., piecewise curve) functions of the predictor variables.As a nonparametric approach, MARS does not make any underlying assumptions about the distribution of the predictor variables [32].MARS considers the relationships between each predictor variable and the outcome variable.For a given predictor variable, MARS partitions across the range of that variable and fits individual linear regression models between partition points.These models are joined at these partition points, also called knots.The process continues through each predictor variable, producing a highly non-linear pattern [33].Compared to polynomial regression, MARS is more robust at fitting non-linear curves to detect subgroup differences in riskdisease relationships [34].Importantly, MARS can estimate the relative feature importance via the generalized cross-validation (GCV) [35].

Model training and testing
The ratio of training and test set was 70/30.Given the unbalanced data, we partitioned the data in a way that each of the training and test sets contained the same preterm: term birth ratio.We built three models: (a) a baseline model without interactions between the features; (b) a second-degree interaction model; and (c) a third-degree interaction model.We implemented 5-fold cross-validation to select the model with the smallest residual, which was evaluated on the test set later.Although MARS has two tuning parameters-the degree of interactions and the number of retained terms-to minimize prediction error, we needed to tune only the degree of interactions since the cross-validation decided the optimal number of terms for the models.We limited our degree of interactions to three so as not to create an unnecessarily complicated model for the given data and prevent overfitting.
We also analyzed both original and weighted data to develop machine learning models for comparison.The original data were the ones collected by the CDC in a way that mothers of low-birth-weight infants, those living in high-risk geographic areas, and racial/ethnic minority groups were oversampled [27].The weighted data were the ones that the sampling weight calculated and assigned by the CDC was applied to represent the population of pregnant women who birthed in certain states and survey years.However, how to include sampling weight in machine models is not clearly documented, and developing weighted machine learning models requires extensive computing resources [36].Therefore, we approximated weighting machine learning models by replicating each observation by the highest integer number of the assigned sampling weight (i.e., converting 5.6 into 5) and training and testing machine learning models on those replicated data.The mean of the sampling weight was 50.94 (range: 1.00-1131.58).
Finally, we calibrated our best-performing model for each population.Specifically, we employed logistic, isotonic, and beta calibration methods for comparison, chose a method generating the best result, and tested the calibrated model on the test set for each population.

Model evaluation matrix
We evaluated the performance of each model via the Area Under the Receiver Operating Characteristic (ROC) Curve (AUC).AUC represents the trade-off between the true-positive and the false-positive rates.In ROC analysis, a diagonal identity line starting at zero indicates that output is a random guess, whereas an ideal classifier with a high true-positive rate (sensitivity) and a low false-positive rate (1-specificity) will curve positively and strongly toward the upper left quadrant of the plot.

Interpretability
We first created a white-box model, like MARS, and simultaneously interpreted already trained MARS models post hoc [25].As mentioned earlier, MARS can compute the feature importance via the GCV.For the post hoc analysis, we analyzed our models by assessing the feature importance and feature effect (via partial dependence plot [37] and individual conditional expectation curve [38]).All data analysis was conducted using R version 4.0.2(2020-06-22).

Subject characteristics and associations with preterm birth
Table 1 illustrates maternal characteristics among the weighted sample populations.7.1% of the women experienced preterm birth across racial/ethnic groups, with non-Hispanic Black women being 1.72 times more likely to experience preterm birth than non-Hispanic White women (11% vs. 6.4%).For the original (unweighted) sample populations that oversampled high-risk women, 16% of the women experienced preterm birth overall, with non-Hispanic Black women being 1.13 times more likely to experience preterm birth than non-Hispanic White women (17% vs. 15%) (Table S3).
Non-Hispanic Black women were inclined to give birth younger and not in a marital relationship.Relative to non-Hispanic White women, non-Hispanic Black women had worse socioeconomic (e.g., lower income and education), psychological (e.g., more exposure to physical abuse by the partner and SLEs), medical (e.g., higher rates of diabetes and hypertension before pregnancy), and behavioral risk profiles (e.g., unintended pregnancy and fewer PNC visits), with three exceptions: non-Hispanic White women were more likely to have people close to them with drinking/drug problems, to experience depression before pregnancy, and to smoke before and during pregnancy.Similar patterns were observed in the weighted (replicated) data (Table S4).
Table 2 presents the weighted preterm birth rates by maternal characteristics.Given the same distribution characteristics, non-Hispanic Black women were generally more likely to experience preterm birth than their non-Hispanic White counterparts.We observed the state-level variations in preterm birth rate within and between the racial/ethnic groups (data not shown).We found a maternal age trajectory of preterm birth rate distinct to each racial/ethnic group, in which non-Hispanic Black women showed a maternal age-related increase in preterm birth rate (known as weathering), whereas non-Hispanic White women showed a typical U-shaped pattern with a higher preterm birth rate on the extremes of maternal age with a nadir in 30-34 years of age.
Preterm birth was significantly associated with all risk factors except for unwanted pregnancy by husband/ For continuous variables, non-Hispanic Black women with preterm birth experienced more terminations of pregnancy in the past and had higher pre-pregnancy BMI than their non-Hispanic White counterparts.In contrast, non-Hispanic White women with preterm birth were 1.75-2.14times more likely than their non-Hispanic Black counterparts to smoke before and during pregnancy.Similar patterns were observed in the original    S5) and weighted (replicated) data (Table S6).

Model performance and calibration
We compared the unweighted and weighted (replicated) models according to the study population (pooled, non-Hispanic Black, and non-Hispanic White), interaction (no interaction, 2-way interaction, and 3-way interaction), and dataset (training and test) (Table 3).Each best-performing model had a different number of terms selected by MARS to produce the smallest model errors.
The weighted (replicated) models differed from the unweighted models in their prediction accuracy and bestperforming model.Overall, the accuracy of the weighted (replicated) models was lower than the unweighted models across the different modeling conditions.The weighted (replicated) models performed the best with 3-way interactions among the pooled (AUC = 0.758) and non-Hispanic Black populations (AUC = 0.757) and with no interactions among the non-Hispanic White population (AUC = 0.765).When evaluated on the test set, the accuracy of the three models was maintained.The   3 There was no significant association between preterm birth and marital status, health insurance before pregnancy, total annual income, number of household members, WIC during pregnancy, physical abuse before and during pregnancy, stressful life events, gestational diabetes, intake of multivitamins, and pregnancy intention among the non-Hispanic Black population 4 The total incomes shown in the table indicate values only from the Phase 7 data.The Phase 8 data have different values (slightly higher than those from Phase 7) in each category after taking the inflation into account.However, both Phases have 12 income categories, which were entered into the models as income tiers.The adjusted amount of income under each category from Phase 8 can be found in Table S2 5 There was no significant association between preterm birth and unwanted pregnancy by husband/partner and fever during pregnancy among the pooled and non-Hispanic White populations accuracy of the calibrated models, whether unweighted or weighted, was identical to that of the uncalibrated models in our study (Table S7).

Feature importance and effect
We identified important features in predicting preterm birth risk among non-Hispanic Black and non-Hispanic White women using two different methods (i.e., GCVbased vs. permutation-based) from the weighted (replicated) data for generalizability (Table 4; Fig. 2, and Fig. 3).Important features from the unweighted data can be found in the supplemental materials (Figure S1 and Figure S2).Despite some variations, both methods generally came to the same conclusion.We found important features in common and distinct to each racial/ethnic group.Specifically, the number of PNC visits, PROM, and medical risk factors were the top three important features across the racial/ethnic groups, although their degrees of importance varied according to the method applied and the study population.In addition, non-Hispanic Black women had more important features for preterm birth prediction than non-Hispanic White women (26 vs. 6 important features via GCV).Unlike non-Hispanic White women, the important features of non-Hispanic Black women included a range of chronic stressors, such as physical abuse during pregnancy, maternal education, SLEs (i.e., imprisonment of husband/partner/self, move to a new address), and household income.Moreover, hypertension before pregnancy, states (i.e., GA, CO, and LA), history of pregnancy termination, BMI before pregnancy, multivitamin intake, and smoking in the second and third trimesters during pregnancy were identified as important among non-Hispanic Black women.Most of these factors are known to be associated with chronic stress.On the other hand, the initiation of PNC in the first trimester and BMI before pregnancy were identified as important among non-Hispanic White women.
For the effects of the top three important features, the predicted probability of preterm birth was greater when pregnant women (both racial/ethnic groups) received the lower number of PNC, experienced PROM, and had medical risk factors (Figure S3 and Figure S4).

Discussion
Although substantial evidence points to robust race/ ethnic disparities in preterm birth in the U.S., the drivers of these disparities remain unclear.To address this issue, we built interpretable and race/ethnicity-specific MARS models to predict preterm birth among non-Hispanic Black and non-Hispanic White pregnant women in the U.S. using a large, nationally representative dataset.More specifically, we compared the prediction accuracy between the models with different specifications, as well as with different datasets: original (unweighted) data that oversampled high-risk pregnant women and weighted data more representative of the pregnant women in the U.S. Importantly, we found commonalities and differences in the important features for preterm birth prediction between non-Hispanic Black and non-Hispanic White women.The number of PNC visits, PROM, and medical risk factors were the most important features for both racial/ethnic groups.Only the non-Hispanic Black model identified several chronic stressors and their medical and behavioral correlates as important features for preterm birth prediction, whose findings were masked in the pooled model.
The existing studies have employed a wide spectrum of machine learning models to predict preterm birth, from linear regression to deep learning [20,[39][40][41][42][43].Despite its strengths that are simple yet sophisticated enough to model the non-linearity and transparent to inform important features for prediction, few studies used MARS models to predict preterm birth among non-Hispanic Black and non-Hispanic White pregnant women.However, our MARS models performed better than linear models in some prior studies [23,40], supporting   the argument that no simple linear hyperplanes could separate preterm birth from term birth [44].Moreover, our MARS models' prediction accuracy was higher than prior studies using different machine learning models and national datasets, including the PRAMS data [21][22][23].
We observed that approximately 6-7% reduction in AUC from the unweighted to the weighted models on the test set across the study populations.This finding was consistent with that of MacNell and colleagues, who fit gradient boosting models on the National Health and Nutrition Examination Survey data to predict all-cause mortality and reported that the unweighted model performance was inflated compared to the weighted model (F 1 score: 81.9% vs. 77.4%)[36].
We also found a multitude of important features for preterm birth prediction unique to non-Hispanic Black and non-Hispanic White women.The identified risk factors for preterm birth were evidenced by the existing literature.Both non-Hispanic Black and non-Hispanic White models commonly identified the number of PNC visits [45], PROM [46], and medical conditions [1] as the most robust predictors for preterm birth.Unlike the non-Hispanic White model, however, the non-Hispanic Black model identified an extensive list of predictors that included chronic stressors and their correlatesnamely, hypertension before pregnancy [47][48][49], history of pregnancy termination [50], maternal education [51,52], maternal BMI [48], multivitamin intake [53], smoking [48,49,54], IPV [55,56], SLEs (i.e., move and imprisonment) [57,58], gestational diabetes [48,49], maternal age [11,49], and household income [59].In addition, we observed the state-level differences in the predicted preterm birth risk within and across the study populations [60].The identified important features for non-Hispanic Black women are a reflection of the unjust and racialized social structure in the U.S., such as unequal opportunities and access to individual and neighborhood resources, as well as racial bias in the criminal justice system, which has a trickle-down effect on individual women's medical conditions, behaviors, and ultimately preterm birth.The country's extensive current efforts in reducing maternal morbidity and mortality should be directed toward health policies that tackle upstream social determinants of health.In the same vein, healthcare systems should institutionalize policies to address their patients' social needs to achieve optimal clinical outcomes.Healthcare systems can develop their own programs while there exist multiple resource referral platforms (e.g., findhelp.org) through which healthcare providers can connect their patients to information and referral systems for community resources [61,62].
Despite some overlaps, however, the current and previous studies showed some variations in the important features for preterm birth prediction.For example, whereas Lee et al. [43] found hypertension, BMI, cervical length, and age, Tran et al. [44] indicated multiple fetuses, cervix incompetence, and prior preterm birth as important features.On the other hand, Gao et al. [20] indicated twin pregnancy, systemic lupus erythematosus, short cervical length, hypertensive disorder, and hydroxychloroquine sulfate.The observed differences can be attributed to the different datasets, models, and analytic populations (e.g., primiparous or multiparous women).Especially, we noticed that the important features in other studies were predominantly represented by biomedical risk factors.In contrast, our study's important features for non-Hispanic Black women also encompassed various social determinants of health, like chronic stressors, beyond biomedical factors.
The unique sets of important features for preterm birth prediction could be found only because we stratified the data and developed race/ethnicity-specific models.In our data, non-Hispanic White women were disproportionately represented; hence, the non-Hispanic White and the pooled models performed similarly.By stratifying the data according to race/ethnicity, we were able to train our models on the data that had a fair representation of each racial/ethnic group to predict preterm birth with higher accuracy and less bias.Treating race/ethnicity as a marker for differential experiences of and exposure to chronic stress could help overcome limitations of the decontextualized chronic stress models, such as low accuracy or inconclusive association, as the lived experience of each racial/ethnic population is closely linked to their unique cultural, social, regional, and historical contexts, making each population's experience different from others [63].Our study findings subscribed to this premise as they showed the predictors of preterm birth that are distinct to each racial/ethnic group and given the same predictor, varying in its magnitude of contribution to preterm birth risk.Moreover, prior studies reported that race/ethnicity-specific machine learning models outperformed race/ethnicity-combined machine learning models [32,64].The accuracy of our weighted models mimicked such a pattern, in which the non-Hispanic Black model performed similarly to the pooled model, and the non-Hispanic White model outperformed the pooled model.
All these promising findings of MARS models, or broadly machine learning models, however, should be interpreted and applied with caution since the field, albeit exponentially growing, is still nascent in healthcare.The prediction of preterm birth is bound by the dataset used; hence, the same model can result in different prediction outcomes in different settings with different populations of pregnant women.We should also be vigilant of the potential bias of machine learning models as they can easily overfit data and may not work well in real-world settings, which could harm individuals in the worst-case scenario.Importantly, machine learning can propagate bias in underlying data, producing skewed knowledge and contributing to exacerbated health inequalities.Although MARS is a white box model through which users can learn what factors drove the prediction, many others are black box models, compromising the models' transparency, interpretability, and trust between developers and users (e.g., patients and healthcare providers).
Further, we should acknowledge numerous challenges to translating machine learning models in research into practice to assist practitioners who serve pregnant women.Examples include the collection of high-quality data, effective data management and data governance strategies, a pipeline for data processing and machine learning with a user-friendly front end, and legal procedures and protection, among others [65].Therefore, machine learning should be harnessed with balanced views, keeping its promises and perils in mind.We believe that our race/ethnicity-specific, interpretable, and weighted machine learning models using nationally representative data can contribute to the continuation of this important discussion moving forward.

Limitations
First, our variables in the analysis were limited due to the secondary data.Specifically, we could not predict different subtypes of preterm birth-namely, spontaneous preterm labor, preterm premature rupture of membranes (PPROM), and medically indicated preterm birth since the only variable available was a clinical estimate of gestational age.Consequently, our findings from grouping different subtypes of preterm birth in one may have obscured more nuanced predictors for each subtype of preterm birth, requiring cautious interpretation and application of the study findings.For the same reason, we also included only individual-level factors to predict preterm birth.Although the PRAMS data contained social support, perceived racial discrimination, and perceived neighborhood safety, we excluded them from the analysis due to their high volume of missing data.These variables were collected only by a few states, although they did not always collect them at the same time.If we had modeled these social determinants of health, the current important features and their order to predict preterm birth may have changed.Also, considering that racial discrimination and unsafe neighborhoods are deemed significant risk factors for preterm birth in racial/ethnic minority communities, our non-Hispanic Black model may have underperformed without those variables.
Second, our study inherited limitations of the selfreported measures, such as recall bias, social desirability bias, and response bias stemming from differences in how survey respondents understand/perceive the questions asked.
Third, our weighted models did not directly factor in the survey's sampling weight for modeling due to the extensive computing power required.To mitigate the problem, we took an alternative approach that replicated each observation by its assigned weight value.It is likely that the prediction outcomes of our pseudo-weighted models deviate from the outcomes of the true weighted models.Nevertheless, the fact that the distribution of maternal characteristics was very similar when directly applying the sampling weight vs. using the replicated data in lieu of the sampling weight alludes to the possibility that the discrepancies in prediction outcomes may not be salient.
Fourth, our input features did not include biological measures, including stress biomarkers.Considering the potential heterogeneity of biological responses to chronic stressors among different racial/ethnic groups of pregnant women, the inclusion of stress biomarkers as input features could improve the prediction accuracy and help us find biomarkers for intervention unique to non-Hispanic Black and non-Hispanic White women.However, our models that predicted preterm birth with demographic, psychosocial, medical, and behavioral factors may be more likely to be used, particularly in underresourced settings where certain testing, often expensive, and resultant biomarker data are not available.
Lastly, although our models showed high prediction accuracy in general, the accuracy of the preterm birth cases was poor, a finding not uncommon in other studies [66].One of the main reasons is likely data imbalance, with a far smaller number of preterm birth cases than term birth cases.We did not oversample preterm birth cases nor undersample term birth cases because such data preprocessing could generate artificial data that may have little in common with real observations and infuse bias into the models.Nonetheless, we acknowledge that using the imbalanced data in this study could have undermined the prediction accuracy of preterm birth cases.Future studies will analyze more balanced data using under-and oversampling techniques to investigate variations in prediction accuracy and important features between models with different conditions.In addition to the data imbalance, we suspect a possibility that preterm birth cases did not capture all characteristics of term births or that many preterm birth cases had similar information with term births [20].

Conclusions
In conclusion, the U.S. continues to experience persistent Black-White inequalities in rates of preterm birth.Although the causes of such inequalities are complex, chronic stress is acknowledged as a highly plausible and potentially major contributor to these inequalities [67].Therefore, predicting preterm birth and examining the contribution of chronic stress to its prediction are important research directions.Our study further established the role of interpretable, race/ethnicity-specific machine learning models as a useful tool to generate risk prediction systems that could inform key factors behind the preterm prediction unique to non-Hispanic Black and non-Hispanic White pregnant women for targeted prevention and intervention.Although our models did not consider all the risk and protective factors for preterm birth, including biomarkers, we found that multiple chronic stressors and their correlates made a significant and unique contribution to preterm birth prediction among non-Hispanic Black but not non-Hispanic White women, indicating more efforts are called for to tackle the identified chronic stressors (mid or upstream social determinants of health) to alleviate the Black-White inequalities in preterm birth.
Considering that good models can only come from good data, we call for national surveillance systems to collect multi-level social determinants of health beyond the individual level by all U.S. states, as well as outcome variables clinically more meaningful (e.g., subtypes of preterm birth).This can make research findings more applicable or translatable to health policies and practices in the real world to prevent preterm birth among vulnerable women.
Moreover, given the complex and heterogeneous mechanisms underlying preterm birth among different racial/ethnic groups in different geographical and social contexts, the interdisciplinary approach combining data science with traditional epidemiological or qualitative research is critical to shedding light on these mechanisms and tackling inequalities in preterm birth.We find it a promising avenue for future studies to model multi-level determinants of health (from physiological to structural factors) with more powerful machine learning models (e.g., deep learning or hybrid) to predict accurately different subtypes of preterm birth among women with intersecting identities.Especially given that the information on the majority cases (i.e., term birth) tends to drive the model's overall prediction accuracy, developing machine learning models that can detect preterm birth, separate from term birth, will be an important task.
Note.BMI = Body Mass Index, CO = Colorado, GA = Georgia, LA = Louisiana, MA = Massachusetts, PNC = Prenatal Care, PROM = Premature Rupture of Membrane 1 Weighted data are technically pseudo-weighted, mimicking the inclusion of sampling weight in the models by replicating each observation in the data by the highest integer value of the weight variable assigned to each observation

Fig. 2
Fig. 2 Feature importance in preterm birth risk prediction among N-H black women (weighted/replicated data)

Fig. 3
Fig. 3 Feature importance in preterm birth risk prediction among N-H white women (weighted/replicated data)

Table 1
Sample characteristics by maternal race/ethnicity among weighted sample populations partner and fever during pregnancy for non-Hispanic White women.On the other hand, a smaller set of risk factors-shorter duration of maternal education (i.e. , < 16 + years), adverse health outcomes before pregnancy (i.e., depression, diabetes, and hypertension), fever during pregnancy, medical risk factors, premature rupture of membrane (PROM), absent or delayed PNC, and fewer numbers of PNC-increased preterm birth risk for non-Hispanic Black women.
Chi-squared tests with Rao & Scott's second-order correction (for categorical variables) and Wilcoxon rank-sum tests (for continuous variables) were conducted for complex survey samples3The total incomes shown in the table indicate values only from the Phase 7 data.The Phase 8 data have different values (slightly higher than those from Phase 7) in each category after taking the inflation into account.However, both Phases have 12 income categories, which were entered into the models as income tiers.The adjusted amount of income under each category from Phase 8 can be found in TableS2 Note.PNC = prenatal care, WIC = Special Supplemental Nutrition Program for Women, Infants, and Children 1 n (%) for categorical variables and mean (SD) for continuous variables2

Table 2
Number and rates of preterm birth by maternal characteristics among weighted sample populations

Table 4
Generalized cross-validation-based variable importance of the best-performing models with weighted (replicated) 1 data